79 research outputs found
Towards Robust and Smooth 3D Multi-Person Pose Estimation from Monocular Videos in the Wild
3D pose estimation is an invaluable task in computer vision with various
practical applications. Especially, 3D pose estimation for multi-person from a
monocular video (3DMPPE) is particularly challenging and is still largely
uncharted, far from applying to in-the-wild scenarios yet. We pose three
unresolved issues with the existing methods: lack of robustness on unseen views
during training, vulnerability to occlusion, and severe jittering in the
output. As a remedy, we propose POTR-3D, the first realization of a
sequence-to-sequence 2D-to-3D lifting model for 3DMPPE, powered by a novel
geometry-aware data augmentation strategy, capable of generating unbounded data
with a variety of views while caring about the ground plane and occlusions.
Through extensive experiments, we verify that the proposed model and data
augmentation robustly generalizes to diverse unseen views, robustly recovers
the poses against heavy occlusions, and reliably generates more natural and
smoother outputs. The effectiveness of our approach is verified not only by
achieving the state-of-the-art performance on public benchmarks, but also by
qualitative results on more challenging in-the-wild videos. Demo videos are
available at https://www.youtube.com/@potr3d.Comment: Published at ICCV 202
Towards Efficient Neural Scene Graphs by Learning Consistency Fields
Neural Radiance Fields (NeRF) achieves photo-realistic image rendering from
novel views, and the Neural Scene Graphs (NSG) \cite{ost2021neural} extends it
to dynamic scenes (video) with multiple objects. Nevertheless, computationally
heavy ray marching for every image frame becomes a huge burden. In this paper,
taking advantage of significant redundancy across adjacent frames in videos, we
propose a feature-reusing framework. From the first try of naively reusing the
NSG features, however, we learn that it is crucial to disentangle
object-intrinsic properties consistent across frames from transient ones. Our
proposed method, \textit{Consistency-Field-based NSG (CF-NSG)}, reformulates
neural radiance fields to additionally consider \textit{consistency fields}.
With disentangled representations, CF-NSG takes full advantage of the
feature-reusing scheme and performs an extended degree of scene manipulation in
a more controllable manner. We empirically verify that CF-NSG greatly improves
the inference efficiency by using 85\% less queries than NSG without notable
degradation in rendering quality. Code will be available at:
https://github.com/ldynx/CF-NSGComment: BMVC 2022, 22 page
Relational Collaborative Filtering:Modeling Multiple Item Relations for Recommendation
Existing item-based collaborative filtering (ICF) methods leverage only the
relation of collaborative similarity. Nevertheless, there exist multiple
relations between items in real-world scenarios. Distinct from the
collaborative similarity that implies co-interact patterns from the user
perspective, these relations reveal fine-grained knowledge on items from
different perspectives of meta-data, functionality, etc. However, how to
incorporate multiple item relations is less explored in recommendation
research. In this work, we propose Relational Collaborative Filtering (RCF), a
general framework to exploit multiple relations between items in recommender
system. We find that both the relation type and the relation value are crucial
in inferring user preference. To this end, we develop a two-level hierarchical
attention mechanism to model user preference. The first-level attention
discriminates which types of relations are more important, and the second-level
attention considers the specific relation values to estimate the contribution
of a historical item in recommending the target item. To make the item
embeddings be reflective of the relational structure between items, we further
formulate a task to preserve the item relations, and jointly train it with the
recommendation task of preference modeling. Empirical results on two real
datasets demonstrate the strong performance of RCF. Furthermore, we also
conduct qualitative analyses to show the benefits of explanations brought by
the modeling of multiple item relations
Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Differential Equation
Video generation models often operate under the assumption of fixed frame
rates, which leads to suboptimal performance when it comes to handling flexible
frame rates (e.g., increasing the frame rate of the more dynamic portion of the
video as well as handling missing video frames). To resolve the restricted
nature of existing video generation models' ability to handle arbitrary
timesteps, we propose continuous-time video generation by combining neural ODE
(Vid-ODE) with pixel-level video processing techniques. Using ODE-ConvGRU as an
encoder, a convolutional version of the recently proposed neural ODE, which
enables us to learn continuous-time dynamics, Vid-ODE can learn the
spatio-temporal dynamics of input videos of flexible frame rates. The decoder
integrates the learned dynamics function to synthesize video frames at any
given timesteps, where the pixel-level composition technique is used to
maintain the sharpness of individual frames. With extensive experiments on four
real-world video datasets, we verify that the proposed Vid-ODE outperforms
state-of-the-art approaches under various video generation settings, both
within the trained time range (interpolation) and beyond the range
(extrapolation). To the best of our knowledge, Vid-ODE is the first work
successfully performing continuous-time video generation using real-world
videos.Comment: Accepted to AAAI 2021, 22 page
- …